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We consider nonparametric estimation of a regression function 
for a situation where precisely measured predictors are used to es- 
timate the regression curve for coarsened, that is, less precise or 
contaminated predictors. Specifically, while one has available a sam- 
ple (Wi, Yi), . . . , (W n ,Y n ) of independent and identically distributed 
data, representing observations with precisely measured predictors, 
where E(Yi|Wi) = g(Wi), instead of the smooth regression function 
g, the target of interest is another smooth regression function m that 
pertains to predictors Xi that are noisy versions of the Wi. Our tar- 
get is then the regression function m(x) = E(Y\X = x), where X 
is a contaminated version of W, that is, X = W + S. It is assumed 
that either the density of the errors is known, or replicated data are 
available resembling, but not necessarily the same as, the variables 
X. In either case, and under suitable conditions, we obtain y^-rates 
of convergence of the proposed estimator and its derivatives, and 
establish a functional limit theorem. Weak convergence to a Gaus- 
sian limit process implies pointwise and uniform confidence intervals 
and y'n-consistent estimators of extrema and zeros of m. It is shown 
that these results are preserved under more general models in which 
X is determined by an explanatory variable. Finite sample perfor- 
mance is investigated in simulations and illustrated by a real data 
example. 
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1. Introduction. 

1.1. Motivation and models. In this paper, we consider nonparametric 
estimation of a regression function in the framework of a novel errors-in- 
variables problem. In the classical errors-in-variables problem, the interest 
is to estimate a regression function m, where 

Y = m(G) + e, 

and a sample (F\,Yi), . . . , (F n , Y n ) of independent and identically distributed 
(i.i.d.) data is available, with Fi = Gi + Si, where G and 5 are independent 
random variables and the distribution of 5 is known. References include Fan, 
Truong and Wang [9] , Fan and Masry [7] , Fan and Truong [8] , Stefanski and 
Cook [16], Carroll, Ruppert and Stefanski [4], Carroll, Maca and Ruppert [3], 
Taupin [17], Devanarayan and Stefanski [5], Ioannides and Matzner-L0ber 
[12], Linton and Whang [14] and Carroll and Hall [2]. 

The situation we consider here is different: we assume that an i.i.d. sample 
(Wi, Yi), . . . , (W n , Y n ) is observed, where 

(1.1) Y i = g(W i ) + e i forl<i<n, 

with independent errors with mean zero and finite variance. Instead of 
estimating the regression function g(w) = E(Y|VF = w) generating the ob- 
servations, the goal is to estimate the target regression function m(x) = 
~E(Y\X = x), which differs from g, as X is a contaminated (coarsened) ver- 
sion of W. 

Specifically, X ~ fx and X = W + 5, where 5 ~ f$ represents a random 
distortion, and W and 5 are independent random variables. We refer to X 
as a coarsened predictor of Y. In Section 1.3 we shall note that the model for 
X can be generalized, without altering the main properties of our methods, 
to the situation where X is a proxy for a variable T related to W, provided 
we have additional data to infer the relationship between T and X. 

The motivating idea is that the sample (Wi, Yi), . . . , (W n ,Y n ), where one 
has precise predictors, is hard to obtain, and therefore future values of Y 
will be predicted from easier-to-obtain contaminated observations X of W. 
This type of problem arises in situations where it is expensive or involved 
to measure W accurately, so that, in routine applications, only the con- 
taminated and less precise predictors X are available. At the same time, 
a training set is available containing more precise predictors. For example, 
if we have a sample of repeated contaminated observations of the predic- 
tor for several individuals, the averaged observations Wi = X^ will provide 
relatively accurate measurements of the predictor. 

The problem we address is how to use the information in the training sam- 
ple, with its accurate measurements, to predict a future response Y from 
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a future contaminated predictor X. One of our central findings is that this 
coarsening of the predictor has the consequence of accelerating the conver- 
gence of the proposed estimator of m from the usual nonparametric rate, 
strictly slower than y/n, to a parametric y^-rate, even if the target regres- 
sion function is known only to be smooth and does not follow any particular 
parametric model. 

In the setting of (1.1), m is generally not identifiable unless we know f§. 
The latter assumption is commonly made in errors-in-variables problems. 
See, for example, Stefanski and Carroll [15] and Fan [6]. However, if we have 
additional data directly on S, or if the data at (1.1) are replicated, then we 
can identify m{x) without knowing fg. In either of these settings we might 
have a parametric model for f$, or we might wish to treat inference about 
fs from a nonparametric viewpoint. In order to show that estimation of m 
is a semiparametric problem, even if fs is not known and we treat it non- 
parametrically, we shall consider a more general, relatively "uninformative" 
type of replication, where we observe only 

(1.2) Uij = Vi + Sij for 1< j<ni and l<i<N. 

Here, V±, . . . , V/v are arbitrary random variables, 5n, . . . , 5j\m N are mutually 
independent, the 6ij are all distributed as 6, and it is assumed that each 
n-i>2. Our results demonstrate that it is possible to attain y^n-consistency 
without making joint assumptions about the data at (1.1) and (1.2). In 
particular, it is not necessary to suppose that the Uij are independent of the 
(Wi,6i) or that the Vi are independent of the 5ij. A direct application of the 
model in (1.2) is where Uj are replicated measurements of Xi, and Vi = Wj. 

1.2. Estimators. First we express m as a ratio, where each component 
can be estimated separately. Since m(x) = E(Y\X = x) = E(g(W)\X = x), 
then 

I g{w)fx\w{x\w)f w (w) dw 



m(x) 
(1.3) 



fx(x) 

J 9{w)fs(x -w)f w {w)dw ip(x) 



J f s (x -w)f w {w)dw tj}(xy 
where we define tp(x) = J fs(x — w)fw{w) dw = E(f$(x — W)) and 

<p{x) = [ g(w)f s (x - w)f w (w) dw = E(g(W)f s (x - W)) = E(Yf s (x - W)). 



If the data (Wi,Yi) are generated by the model (1.1), and fs is assumed 
known, then the representations above motivate the estimators 

n 

£(x)=n- 1 Y / Y i fs(x-W l ), 
i=i 



4 A. DELAIGLE, P. HALL AND H.-G. MULLER 

n 

i>(x)=n- 1 Y / fs(x-W i ) 
i=l 

of fix) and ip(x), respectively, leading to the estimators 

T,i=iYifs(x-Wi) 0(x) 



(1.4) m(x) 



T2=ifs(x-Wi) fa) 



of m{x). An attractive feature of m is that it does not require a smoothing 
parameter. 

When additional data following (1.2) are available, we propose a Fourier- 
inversion approach to estimating ip and tp, as follows. Assume that 5 has a 
symmetric distribution, with positive characteristic function f$, 

(1.5) ff(t) = R/f(t) > for all real t, 



where the superscript ft denotes Fourier transform, and the Fourier trans- 
form of a function / is given by f (t) = J f(x)e ltx dx. The real part of / ft is 
denoted by K/ ft . (Our methods can be generalized to the case of asymmetric 
error distributions, using techniques borrowed from Li and Vuong [13].) Our 
estimator of ff is 



1/2 

(1-6) ff{t) 



1 N 

JjJl exp[it(U jkl - U jk2 )} 

j=l l<ki<k2<rij 



where M = \ J2jL 1 nj(rij — 1). (Here and below, / ft denotes an estimator 
of the Fourier transform of /, not the Fourier transform of an estimator / 
of/-) 

Writing f\y for the density of W, estimators of the Fourier transforms, 
fw and (/wsO f \ of f w and f w g are respectively given by 

i n 1 n 

(1.7) /&(*) = -5>xp(^), (h^f(t) = -J2Y j exp(itW j ). 

Estimators of ip and <p based on Fourier inversion are then obtained as 
m = ^l fUt)fJ(t)e~ itx dt, 

27T J\t <T n 

(1.8) 

= 5- / (S7) ft (t)/?(t)e- te di, 
ivr J|i|< r „ 

where r n is a smoothing parameter. Our estimator of m is in = tfltp /ISttp. 
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1.3. Generalizations. The main features of our approach also apply to 
the more general case where X = p(T \ 9) + 5, that is, where W = p(T \ 9) 
for a r.v. T, and (T, Y) rather than (W,Y") is observed in a subset of the 
available data. Here p{- \ 9) is a parametric model, determined by the finite 
parameter 9. 

In this setting, we would ideally take W% = p(Ti \ 9). However, in most 
cases, we have to settle instead for W% =p{Ti \ 9), where 9 is a -^/n-consistent 
estimator of 9, computed by least squares from data (T/,X 4 '), with the same 
distribution as (T, X), and related by X[ =p(T[ \Q) + 5[, for 1 < i < r, say. 
The most important special case is that where p is linear: p(t \ 9) = 9^ + 
9( 2 h, with 9 = (6>W,0( 2 )) denoting a vector of length 2. 

In this model, the variable X typically represents a proxy for the vari- 
able T, where T often is not available in applications, because it is too 
costly to measure it, for example. In some applications, however, we are 
able to observe (Tj,Xj, Yj) for 1 < i < n in a "training set," where r = n. 
We then propose to use the estimator rh, rather than a more conventional 
nonparametric regression based on (Aj,Yj), since it is more accurate, as we 
will demonstrate. In some cases the training set (T^X-) might be genuinely 
different from (Tj,Xj). For example, (T^X'j) might represent external data. 

In the case where X = p{T \9) + 5 the estimators rh and m differ only in 
that we replace W{ by Wi at each appearance. Under appropriate regularity 
conditions the main properties of m and m, and in particular their ^Jn- 
consistency [provided n = 0(r)], do not change. This point will be discussed 
in Section 2. 

2. Asymptotic results. 

2.1. Case where f$ is assumed known. Here we discuss asymptotic prop- 
erties of the estimator defined at (1.4). A central result is the weak conver- 
gence of a suitably scaled estimator process, with -y/n-scaling, to a Gaussian 
limit process in the location argument x. This result (Theorem 1 below) im- 
plies, among other matters, pointwise and uniform limits, local and simul- 
taneous confidence bands, and convergence of estimated extrema locations. 

We assume throughout Section 2 that the distribution corresponding to 
fs is absolutely continuous, and in Section 2.1 that fs has a bounded deriva- 
tive. In Section 2.1 it is not necessary to suppose that the densities fw or 
fw,Y exist, although it is convenient to use the notation fw and fw,Y when 
introducing the quantities needed to state and derive our results. However, 
the differential elements fw{w)dw and fwy{w,y) dw dy may be interpreted 
as Fw(dw) and Fw,y (dw , dy) , respectively; the distributions need not be 
absolutely continuous. 
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Given an integer v > 0, and assuming all quantities are well denned, let 
ip and tp be as in Section 1 and define 



Below, the notation D denotes a compact set on which we shall estimate 
<p and ip. The following conditions, indexed by v = or 1, will be used to 
prove our results. For v = 1 they can be relaxed to an assertion about the 
modulus of continuity for the corresponding quantity when v = 0; we impose 
the more stringent condition only for simplicity and brevity. 

(A^i) (boundedness of fjj ) sup x yeK \h(x, y)\ < oo; 

(A^) (smoothness of f^) h{x,w) is an integrable function which is uni- 
formly Lipschitz continuous in x, that is, sup^ \h(xi,w) — h(x 2 ,w)\ < 
L\xi — x 2 \, for a constant L > 0; 

(A^3) (boundedness of inf^eD \ \ = cp > 0; 
(A4) (finiteness of moments) / \y\fy(y) dy < 00 and / y 2 fy(y) dy < 00. 

Note in particular that conditions (A^i) and (A4) guarantee that all the 
quantities defined above exist, and a and (3 satisfy sup xgD |ce(x)| < 00 and 
svp x&D \P(x)\ <oo. 

Let =>■ denote weak convergence in C(D) and define 



Our main result is a functional limit theorem for the proposed estimator. 
(All proofs are deferred to Section 5.) 

Theorem 1. Under conditions (A^i), (A^) for v = 0, 1, (A ^) and 
(A-a), we have that, for the process Z n (x) = y/n(fh(x) — m(x)), Z n ^> Z on 
C(D), where Z is a Gaussian process with zero mean and covariance 

cov{Z(x 1 ),Z(x 2 )} 



= ipi(xi, x 2 ) / {ip(xi)ip(x 2 )} + ipi(xi, x 2 )(p(xi)ip(x2) / {ip 2 (xi)ip 2 (x 2 )} 
- fi(xi,x 2 ){ip(x 1 )ip(x 2 ) + ^(x 2 )ip(x 1 )}/{ip 2 (x 1 )ip 2 (x 2 )}, 








for xi, x 2 G D. 
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The correlation structure for estimates at points x\ 7^ xi is seen not to van- 
ish asymptotically, in contrast to the well-known behavior of local smoothing 
estimators where estimates at different points become asymptotically un- 
corrected as bandwidths and windows converge to zero. Define f±[x\,X2) = 
£?=i Yif s (xi - Wi)fs(x 2 - Wi) , Mxux 2 ) = n- 1 fs{x 1 - W l )f 5 (x 2 - 
Wi) and <pi{x\,X2) = n~ x Ya=i Y 2 fs(x\ — Wi)fg(x2 — Wi). Particular conse- 
quences of Theorem 1 include the properties sup xeD y/n(m(x) — m(x)) -2- 
sup xe £) Z(x) and y/n(m(x) — m(x)) —> N(0, V(x)) as n — > 00, where V(x) = 
cov(Z(x), Z(x)) is estimated uniformly and -v/n-consistently by V(x) = 
<pi(x,x)ip~ 2 (x) + (p 2 (x)tpi(x,x)il)~ 4L (x) — 2ip(x)p(x,x)ip~' i (x), in the sense 
that sup^g^, \V(x) — V(x)\ = Op{n~ 1 / 2 ). It follows that an asymptotic (1 — 
a)-level confidence interval for m(x) has endpoints mix) ± V"(a;) 1 / 2 3>~ 1 (1 — 
a/2),m(x), where $ denotes the standard normal distribution function. 

Semiparametric efficiency of rh can be established, in regular cases where 
fs(x — w) is monotone in x for w in the support of W, by considering the fol- 
lowing simpler problem. Suppose we observe independent and identically dis- 
tributed pairs (Ri, Si), . . . , (R n , S n ), where Ri>0 and Si = p{Ri) + with 
p a smooth function and e.; independent of Ri and distributed as N(0,<r 2 ). 
Consider the problem of estimating (61,62) = (E(R),E{Rp(R)}) from these 
data. The estimator (6\, 62) = [n~ l J27=i Ri-> n_1 2~22=i Ri^i) 1S asymptoti- 
cally normally distributed and semiparametric efficient in this problem, and 
thus 62 1 6\ is semiparametric efficient for 62 /6\ . (The proof follows via Exam- 
ples 3.2.1 and 3.3.4, and Propositions 3.3.1 and A. 5. 2, of Bickel et al. [1].) We 
may identify m(x) and rh(x) with 62/61 and 62/61, respectively, by taking 
Ri = fs(x - Wi) and p(r) = g{fiJ~(r)}, where fg x (w) = f s (x - w). 

Under additional regularity conditions, Theorem 1 continues to hold, al- 
though with an altered covariance structure for the limiting process Z, in 
the more general setting described in Section 1.3. There we observe Tj, in 
the setting of an unknown parameter 6, rather than Wi = p(Ti \ 6), and Wi 
is replaced by Wi =p(Ti \ 6) in the definition of fh. If the model p(- \ 6) is 
linear, then the only additional assumptions needed are two bounded deriva- 
tives of fs, and E(T 2 ) < 00, where T denotes a generic Tj. See Section 5.3 
for an outline proof. 

2.2. Case where fs is estimated from replicated data. The conditions im- 
posed below [see particularly (2.2)] imply that the distributions of W and 
S are absolutely continuous, and in particular that the respective densities 
fw and fs are square-integrable. We shall assume that 

(2.1) maxn;<oo, n = o(N), n V(2(A+A,-l)) < Tn ^ jV i/(2(i+A 4 )) j 
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where a n -C b n for positive sequences a n and b n means that a n /b n — ► as 
n — ► oo; and that, for constants A, > satisfying 

(2.2) A>A,5 + 1 and A 5 > 1, 

we have 

\(fwg) h (t)\ + \f&(t)\ < const. |t|" A for all t, 

(2.3) 

/f(t) > for all t, \fs(t)\ x \t\~ Xs as i — ► oo. 

The second part of (2.1) asks that there be an order of magnitude more 
values of Uij, at (1.2), than there are pairs (Wi,Yi), at (1.1). Conditions 
(2.2) and (2.3) ask that fs be sufficiently smooth, with its Fourier trans- 
form decaying in the standard polynomial way, and that fw and fw9 be 
sufficiently smooth relative to fs- The second part of (2.1), and (2.2), imply 
that it is always possible to choose the smoothing parameter r n such as to 
satisfy the third part of (2.1). 

Theorem 2. If the function g is uniformly bounded, if the errors ei 
at (1.1) have zero mean and finite variance, and if (2.1)-(2.3) hold, then, 
uniformly in x, 

(2.4) i,(x)=iP(x)+o p (n- 1 / 2 ), ${x)=v{x) + o p {n- l l 2 ). 

Let I denote an interval for which inf xg j^(x) > 0. Result (2.4) implies 
that, under the additional conditions imposed for Theorem 1, the estima- 
tor fh = (p/if), which is an alternative to rh = ip/ip discussed in Section 2.1, 
satisfies fh(x) = rh(x) + o p (n~ 1 / 2 ) uniformly in x EX. Therefore fh inherits 
the weak convergence and semiparametric-efficiency properties of fh on X. 
Theorem 2 holds, under more restrictive assumptions, in the more general 
setting of Section 1.3; see Section 5.3. 

3. Simulations. We implemented our estimator m(x) of m{x) on samples 
of (W, Y) generated from models of two types: 

(1) g( w ) = [3-u; + 20(27r)- 1 / 2 exp(-200('u; - l/2) 2 )]l [0jl] (w), W~U[0,1], 
e~N(0,a 2 ) and 5~N(0,cj|) or 5 ~ U[-l/2, 1/2]; 

(2) Y\W = w~ Bernoulli(g(u;)), with g(w) = exp(6w;)/[l-|-exp(6u>)], W ~ 
C/[-0.5,0.5], 5 ~ N(0,ct 2 ) or with g(w) = 0.45sin(avru;) + 0.5, a = 2 or 4, 
W~17[0,1], 5~N(0,cj 2 ) or J~J7[— 1/2, 1/2]. 

The last example was used by Hobert and Wand [11]. In each case, 
we considered several sample sizes (n = 50, 100 and 250) and the param- 
eters var(5) and var(e) were chosen such that the noise-to-signal ratios 
NS 5 =var(£)/var(W0 and NS e = var(e)/|| 5 || 00 equal 10%, 25% or 50%. We 
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considered the situation where the values of X are available as well, which 
allowed us to compare our estimators with the n -2 / 5 -consistent Nadaraya- 
Watson estimator of m(x), based on observations of (X,Y). In all cases, 
our estimators based on (W, Y) performed much better than itin, which was 
biased and much more variable. These findings continued to hold in the set- 
ting of Section 1.3, where the sample available was (Ti,Xi,Yi), i = 1, . . . , n, 
and the error variance was unknown and estimated by the empirical variance 
of the sample Xi — Wi, i = 1, . . . , n. More details are available from the first 
author's website. 

The typical behavior of our estimator is illustrated in Figure 1, where we 
compare, for case (1) with uniform 5, NS$ = 0.1, NSs = 0.25 and n = 250, 
the results of 1000 replications of the estimators rh with the correct error 
density f$ and rh with f$ misspecified (here we used Gaussian error instead 
of the uniform error). In both cases, the estimates shown correspond to the 
first, fifth and ninth deciles of the ordered 1000 values of the integrated 
squared error J(rh(x) — m(x)) 2 dx. We see that for small NSs, the estima- 
tor is quite robust to error misspecification, but, without any surprise, the 
quality deteriorates as the ratio increases. Note, however, that the results 
remain quite good for NSs = 0.25. 

4. Real data illustration. We illustrate the proposed estimator in the 
setting of Section 1.3 on a real data example. The data set was collected 
during a South African study on heart disease and was used by Hastie, 
Tibshirani and Friedman [10]. The data are available at 
www-stat.stanford.edu/ElemStatLearn. During the study, several variables 
were measured on males in a heart-disease high-risk region of the Western 
Cape, including low density lipoprotein cholesterol (LDL) and total choles- 
terol (CHOL) as predictors, and coronary heart disease history (CHD) as 
response, coded as = nonincidence of CHD, 1 = incidence of CHD. LDL 



I 
p 
i 



Known 
Misspec 






/ \ 1 

/ V ■ ■ - : 




Fig. 1. The estimator rh with the error fs known (uniform) or misspecified (Gaussian) 
for case (2), with NSs = 0.1 (left panel) or NSs ~ 0.25 (right panel), with NS e — 0.1 and 
n = 250. The solid curve is the target curve m. 
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is much more difficult to measure than CHOL, which motivates the use 
of CHOL as a proxy for LDL (Carroll, Ruppert and Stefanski [4]). After 
deleting several outliers, the relationship between LDL and CHOL can be 
reasonably well modeled as log(CHOL) = 6» (1) + # (2) log (LDL) + 5 with 5 a 
random variable of zero mean; see Carroll, Ruppert and Stefanski [4], who 
use the same model for a similar data set. Checking for outliers, we deleted 
the observations corresponding to the smallest (resp., two largest) value(s) 
of CHOL, the smallest three (resp., largest two) values of LDL, and the eight 
points of (log(CHOL), log(LDL)) the furthest away from the least squares 
line. 

We set Y = CHD, X = log(CHOL) and W = §M + 6^ log (LDL), where 
= 4.8890 and 6^> = 0.3663 are the least squares estimators of and 
6^ 2 \ Our goal is to estimate m(x) = ~E(Y\X = x), the conditional expec- 
tation of incidence of coronary heart disease given the (transformed) total 
cholesterol level, using the sample of n = 446 observations. 

We compare the proposed estimator rh{x) with the Nadaraya-Watson 
estimator m^. The data suggest that it is reasonable to assume that the 
errors 5, = X{ — Wi are normal, where the variance can be estimated from the 
differences Xi — Wi. In Figure 2, we overlay the proposed estimator rh and 
the Nadaraya-Watson estimator rh^ calculated with an appropriate data- 
driven cross-validation bandwidth. The graphs suggest that the probability 
of coronary heart disease increases with the cholesterol level. The increase 
is highly nonlinear, and there are clear differences between the classical 
Nadaraya-Watson estimator and the proposed estimator. The Nadaraya- 
Watson estimator exhibits additional fluctuations, especially in the right 
tail, thus giving a less stable appearance. 



Proposed 
W-Wfor(X r V) 




Fig. 2. The proposed estimator rh and the Nadaraya-Watson estimator mjv based on 
the observations of (X,Y), and a scatter plot of the 446 observed values of (X,Y) (left 
panel) or the 446 observed values of (W,Y) (right panel), for the coronary heart disease 
data. 
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5. Proofs. 



5.1. Outline proof of Theorem 1. Define the auxiliary quantities 

~<P(x)-<p(x) (i>(x) - ip(x))(p(x)~ 



Zn{x) V "L ^{x) 

n 

a(x) = n~ i y^Yjh(x,Wj), 



ip 2 (x) 



a 1 (x 1 ,x 2 ) 
P(x) 



i=l 
2/ 



n 



y h(x 1 ,w)h(x 2 ,w)f w ,Y( w ^y)dwdy, 

n 



i=l 



/9i(xi,x 2 )= / h(xi,w)h(x2,w)fw(w)dw. 



The next two results will be useful to prove the theorem. Their proof is 
given at the end of this section. 

Lemma 1. Let v be a positive integer and x 6 D. Under Conditions 
(A V>1 ), (A„ j2 ) and (A 4 ) ; 

\/n(a(x) — a(x)) =>■ Z a (x), y/n(P{x) — (3{x)) =>- Z /3 (x), 

where Z a , Z@ are Gaussian processes characterized by the moments E(Z a (x)) = 
E(Zp(x)) = 0, andcov(Z a (xi), Z a (x 2 )) = ai(xi,x 2 )-a(xi)a(x 2 ),cov(Z/3(xi), 
Zf}{x 2 )) =(3 1 (x 1 ,x 2 ) - (3(xi)(3(x 2 ), for all x±,x 2 £ D. 

Lemma 2. Let x±, . . . ,Xk G D. Under conditions (Ao,i) and (A4), for all 



(ti,... ,t k )' e 



we 



have J2j=itjZ n (xj) ^N(0, t'Ylt), where 



/ v n ^i(xj,xi) tp(x j )tp(xi)ipi(x j) xi) 



ip(xj)ip(xi) ' i/j 2 (xj)tp 2 (xi) 

tp(xi)n(Xj,Xl) tp(Xj)fl(xj,Xl) 



ip(xj)ip 2 (xi) ll)(xi)tlj 2 (Xj) 

Put Z n (x) = yjn(fh(x) —m(x)) = X n (x) +Y n (x), where ip(x)X n = y/n(0(x) - 
tp(x)) and ip(x)ijj(x)Y n (x) = —y/n(i/j(x) — ?p(x))ip(x). It suffices to prove 
(a) convergence of the finite-dimensional limit distribution of Z n , and (b) tight- 
ness of Z n . To establish (a), note that 



(5.1) 



ZJx) 



Z n (x) 



tp(x) — ip(x) 
ip{x) 



i>(x) ip(x) 
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Now 



sup 

x&D 



<p(x) ip(x) 



< 



suPxepl^Q) -y(g)| 
inf^GD \ip{x)\ 



+ 



SUPsgD \<p(x)\ ■ SUp x&D \lp(x) - -4>(x)\ 



inf 



\ip(x)ip(x)\ 



where inf xG £) |^ n (cc)| — > inf^g/j | ^ (ic) | > 0, which, combined with Lemma 1, 
proves that the last term of (5.1) tends to zero as n tends to infinity, and 
thus Z n (x) has the same finite-dimensional limit distribution as Z n (x). Prom 
Lemma 2 and the Cramer-Wold device, this limit distribution is the same 
as that claimed for Z in Theorem 1. To prove (b), note that, by the proof of 
Lemma 1, the sequences y/n(tp(x) — p(x)) and y/nfyfa) — ip{x)) are tight. 
The sequence <p(x) /ip(x) is tight if we show that for given e,rj > and suf- 
ficiently small 5 and large n, 



(5.2) 



P 



sup \ip(x)/tp(x) 

x—y\<8 



0(y)/i>(y)\ >e)<v- 



Now, defining £(sc) = jjyf W y(w,y)f' & {x - w)dwdy, ((x) = J f w (w)f' s (x - 
w)dw, i(x) = n^T^iYif'gix - Wi) and Q(x) = n^T.U.f's^ ~ W i), let 
f(x) = [^(x)ip(x) - fi(x)((x)]/ip 2 (x). By the mean value theorem, the left- 
hand side of (5.2) is bounded by P(sup xeD \T(x)\ > e/S) and (5.2) follows if 
we note that 



sup |T(x)| < sup \£(x) 
xeD xeD 



^x)\/\iP(x)\ + S up\^x)\/\iP(x)\ 



■ sup \<p(x) - ip{x)\/\ip(x)\ 2 + sup \ip(x)\/\ip(x)\' 

igD x£D 



sup |C(x) - C(x)\/\^(x)\ 2 + sup \((x)\/\^(x)\< 

xeD xS-D 

which tends to zero as n tends to infinity. Property (b) follows. 

Proof of Lemma 1. We prove the result for a; the proof for (5 is anal- 
ogous. Let x\, . . . G D, a = (a(xi), . . . , a(xk))', ol = (a(xi), . . . , a{xk))' 
and Z a ~ N k (0,T, a ), where (E a )ij = ai{x i} Xj) - a(xi)a(xj). Applying 
the central limit theorem to the i.i.d. sequence Ti,...,T n , with 



Tj = J2j=i tjYih(xj , Wi), it is not hard to prove that, for all t = (ti , . . . , t^)' G 

M. k , y/nt'(a — a) t'Z a . From the Cramer-Wold device, we deduce that 

\fn{a — ol) —> Z a , which implies weak convergence of the finite-dimensional 
distributions. Using uniform Lipschitz continuity of h in the first coordi- 
nate, one can show that ~E{s/n[a{xi) — a{x\) — a{x2) + a(x2)]) 2 < c\x± — X2\ 2 , 
which implies tightness of y/n{& — a). □ 
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Proof of Lemma 2. Since Eip(x) = <p(x) and Eip(x) = ip(x), we have 
that E(Z n (x)) = 0. The result follows from the central limit theorem if 
we note that ^/nY^j=itjZ n (xj) may be written as J27=iTi, where Tj = 

Ej=i tjfsixj - WiWMxj) - tpfay^ixj)]. □ 

5.2. Outline proof of Theorem 2. We shall derive the second result at 
(2.4); a proof of the first result there is similar. Define the functions a = 

{fwgf, a = (iWs) f \ b = (if) 2 , b = (/p) 2 , c = ff, A a = a- a and A b = b-b. 
Let T denote the interval [— T n ,T n ], write T for the complement in R of T, 
and put u x (t) = e~ ttx . Then, uniformly in x, 

2vr^(x) = J a\b\ 1/2 u x 

= J^a + A a )\b\ 1 / 2 (l + b- 1 A b ) 1 / 2 u x 

= f (a + A a )cu x + O p \ f |ct/c|(EA 2 ) 1/2 + f c _1 (EA 2 EA 2 ) 1 / 2 . 

Using the fact that a equals a sum of n independent and identically dis- 
tributed random variables, and b is expressed in a form similar to a [/- 
statistic, it can be shown that E[A a (t) 2 ] = 0(n" 1 ) and E[A h (t) 2 } = 0(iV _1 ), 
uniformly in t. Moreover, (2.2) and (2.3) imply that fq-\a/c\ = 0(1), fj- c _1 = 
°( T n S+1 )i If acu x = C(r r l~ A ~ A «) and J f A a cu x = O p (n -1 / 2 r^ A *), the latter 
two results holding uniformly in x. Therefore, uniformly in x, 

2irtp(x) = J (a + A a )cu x 

(5.3) + O p (N"^ 2 + n-^N- 1 ' 2 ^ 1 + + n^ 2 ^) 

acu x + Op(n~ 1 ^ 2 ). 

Since (p{x) = (2tt)~ 1 f acu x , then the second part of (2.4) follows from (5.3). 

5.3. Case where X = p(T \ 9) + 5. This generalization, in which (T,Y) 
rather than (W,Y) is observed, was introduced in Section 1.3. There we 
noted that the unknown parameter 6 could be estimated by least squares 
from data (T-, X[), for 1 < i < r, on (T, X). In the case of a linear model, p(t \ 
6) = flW + 0( 2 )t, and our estimator of W { = 6»« + B^Ti is W t = + # (2) T;. 
We shall treat this particular case below; other models for p can be addressed 
similarly. 

Let rh*, (p* , ip* and ip* denote the versions of m, ip, tp and ip, respectively, 
obtained on replacing Wj by W% throughout. It will be assumed that n = 
0(r). In this case the least squares estimators 9^ and B^ 2 ' are -^/n-consistent. 
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First we consider the setting where fg is known. Provided fg has two 
bounded derivatives, we may write 

n 

£*{x) = n- 1 Y J Y i f8{x-W i ) 

i=X 

(5.4) = 0(x) - 0® - eV)E{Yf' s (x - W)} 

_ (0(2) _ e W)E{TYf' s ( X - W)} + o^n- 1 ' 2 ), 

n 

r(x) = n- 1 J2fs(x-W l ) 

i=l 

(5.5) = ${x) - {1) - eV)E{f' 5 (x - W)} 

_ 0(2) _ e ^)E{Tf' s (x - W)} + o p {n- 1 ' 2 ). 

Here, and ip are the original estimators of ip and ip given in Section 1.2 for 
the case where Wi is directly observed; W = O^ 1 ' + 6^T; and the remainder 
terms o p (n~ 1 / 2 ) are uniform in x, provided the conditions of Theorem 2 hold 
and, in addition, E(T 2 ) < oo. 

It follows from (5.4) and (5.5) that ip and ip are -y/ra-consistent for eft 
and ip, respectively, and m* = 0* /ip* is -^/n-consistent for m. A version of 
Theorem 1 is readily obtained in this setting, using (5.4) and (5.5). Unless 
r/n — > oo, the covariance structure of the limiting Gaussian process depends 
on whether the data (T/,Xj), from which 8^ and 9^ are computed, are 
independent of the data (Wi,YP) used to calculate (p and ip, or whether 
(Tl,X<) = (Ti,Xi) and the triples {T^X^Yj) are observed. 

The case where fg is not known, and is consistently estimated from repli- 
cated data as discussed in Section 1.2, is similar although more complex. 
Our estimator fg, given at (1.6), does not alter since it does not use the 

data Wi. On the other hand, the estimators fyfr and {fw9) ft , given at (1.6) 
and (1.7), are replaced by 

n 

f^*(t)=n- 1 ^MitW j ), 
i=i 

n 

(fwgf = n~ l Y, expiitWj). 

3=1 

Substituting the latter for and (fwg) ft , respectively, in (1.8); Taylor- 
expanding exp(— itWj) as exp(itWj){l + it(Wj — Wj) + •••}; and taking the 
smoothing parameter r n in (1.8) to be of order n^ 1 / 2 ^ -2 '', for some r\ > 
[so that, under moment conditions on Wj, r n sup J<n \ Wj — Wj\ = O p {n~ ri )\, 
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we may deduce that (2.4) continues to hold if (p and ip there are replaced 
by 0* and ip*, provided more restrictive assumptions than those given in 
Theorem 2 are imposed. 
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